Search CORE

872 research outputs found

Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

Author: Cappé Olivier
Lavergne Thomas
Sokolovska Nataliya
Yvon François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2010
Field of study

Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parameter set can be exploited to significantly speed up training and labelling. We then introduce coordinate descent parameter update schemes for CRFs with L1 regularization. We finally provide some empirical comparisons of the proposed approach with state-of-the-art CRF training strategies. In particular, it is shown that the proposed approach is able to take profit of the sparsity to speed up processing and hence potentially handle larger dimensional models

arXiv.org e-Print Archive

Crossref

An Operational SSL HF System (MILCOM 2007)

Author: Erhel Yvon
Marie François
Publication venue: HAL CCSD
Publication date: 28/10/2007
Field of study

8 pagesInternational audienceAbstract- This paper presents an operational HF (3-30MHz) system designed for single site localization (SSL) of transmitters involved in trans horizon radio links. It associates the estimation of the directions of arrival of incident radio waves refracted by the ionosphere with a ray tracing software based on the PRIME model of the channel. The direction finding processing is implemented on an array of non identical sensors that presents a polarization sensitivity. A specific version of the MUSIC algorithm jointly estimates the angles of arrival (azimuth and elevation) of incident waves and their polarization. Statistics of the angles of arrival (mean values and standard deviation) are the input data of a ray tracing software based on the PRIME model of the ionosphere which computes the estimated position of the transmitter. Numerous radio links have been tested for long distances up to 2000 km. A very good agreement is observed between the exact and the estimated positions of the transmitters with a standard localization error being less than 10% of the distance to the receiving system

HAL-Rennes 1

Du quatrième de proportion comme principe inductif : une proposition et son application à l’apprentissage de la morphologie

Author: Stroppa Nicolas
Yvon François
Publication venue: Association pour le Traitement Automatique des Langues
Publication date: 01/01/2006
Field of study

Nous présentons un modèle d’apprentissage par analogie qui exploite la notion de proportions analogiques formelles ; cette approche présuppose de savoir donner un sens à ces proportions et de pouvoir implanter efficacement leur calcul. Nous proposons une définition algébrique de cette notion, valable pour les structures utilisées couramment pour les repré- sentations linguistiques : mots sur un alphabet fini, structures attribut-valeur, arbres étiquetés. Nous présentons ensuite une application à une tâche concrète, consistant à apprendre à ana- lyser morphologiquement des formes orthographiques inconnues. Des résultats expérimentaux sur plusieurs lexiques permettent d’apprécier la validité de notre démarche

Irish Universities

DCU Online Research Access Service

Measuring text readability with machine comprehension: a pilot study

Author: Benzahra Marc
Yvon François
Publication venue: HAL CCSD
Publication date: 01/08/2019
Field of study

International audienceThis article studies the relationship between text readability indice and automatic machine understanding systems. Our hypothesis is that the simpler a text is, the better it should be understood by a machine. We thus expect to a strong correlation between readability levels on the one hand, and performance of automatic reading systems on the other hand. We test this hypothesis with several understanding systems based on language models of varying strengths, measuring this correlation on two corpora of journalistic texts. Our results suggest that this correlation is rather small that existing comprehension systems are far to reproduce the gradual improvement of their performance on texts of decreasing complexity

Learning the Structure of Variable-Order CRFs: a finite-state perspective

Author: Lavergne Thomas
Yvon François
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies. Such situations are not rare and arise when dealing with morphologically rich languages or joint labelling tasks. We extend here recent proposals to consider variable order CRFs. Using an effective finite-state representation of variable-length dependencies, we propose new ways to perform feature selection at large scale and report experimental results where we outperform strong baselines on a tagging task

Crossref

Evaluating Subtitle Segmentation for End-to-end Generation Systems

Author: Alina Karakanta
François Buet
François Yvon
Mauro Cettolo
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2022
Field of study

Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then introduce Sigma, a new Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare Sigma with existing metrics, we further propose a boundary projection method from imperfect hypotheses to the true reference. Results show that all metrics are able to reward high quality output but for similar outputs system ranking depends on each metric’s sensitivity to error type. Our thorough analyses suggest Sigma is a promising segmentation candidate but its reliability over other segmentation metrics remains to be validated through correlations with human judgements

Archivio della ricerca - Fondazione Bruno Kessler

BiSync: A Bilingual Editor for Synchronized Monolingual Texts

Author: Crego Josep
Xu Jitao
Yvon François
Publication venue
Publication date: 01/06/2023
Field of study

In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages. In the case of written communication, users with a good command of a foreign language may find assistance from computer-aided translation (CAT) technologies. These technologies often allow users to access external resources, such as dictionaries, terminologies or bilingual concordancers, thereby interrupting and considerably hindering the writing process. In addition, CAT systems assume that the source sentence is fixed and also restrict the possible changes on the target side. In order to make the writing process smoother, we present BiSync, a bilingual writing assistant that allows users to freely compose text in two languages, while maintaining the two monolingual texts synchronized. We also include additional functionalities, such as the display of alternative prefix translations and paraphrases, which are intended to facilitate the authoring of texts. We detail the model architecture used for synchronization and evaluate the resulting tool, showing that high accuracy can be attained with limited computational resources. The interface and models are publicly available at https://github.com/jmcrego/BiSync and a demonstration video can be watched on YouTube at https://youtu.be/_l-ugDHfNgU .Comment: ACL 2023 System Dem

arXiv.org e-Print Archive

Cross-lingual alignment transfer: a chicken-and-egg story?

Author: Aufrant Lauriane
Wisniewski Guillaume
Yvon François
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International audienceIn this paper, we challenge a basic assumption of many cross-lingual transfer techniques: the availability of word aligned parallel corpora, and consider ways to accommodate situations in which such resources do not exist. We show experimentally that, here again, weakly supervised cross-lingual learning techniques can prove useful, once adapted to transfer knowledge across pairs of languages

Crossref

Publikationsserver der RWTH Aachen University

Reassessing the proper place of man and machine in translation: a pre-translation scenario

Author: Ive Julia
Max Aurélien
Yvon François
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Traditionally, human--machine interaction to reach an improved machine translation (MT) output takes place ex-post and consists of correcting this output. In this work, we investigate other modes of intervention in the MT process. We propose a Pre-Edition protocol that involves: (a) the detection of MT translation difficulties; (b) the resolution of those difficulties by a human translator, who provides their translations (pre-translation); and (c) the integration of the obtained information prior to the automatic translation. This approach can meet individual interaction preferences of certain translators and can be particularly useful for production environments, where more control over output quality is needed. Early resolution of translation difficulties can prevent downstream errors, thus improving the final translation quality ``for free''. We show that translation difficulty can be reliably predicted for English for various source units. We demonstrate that the pre-translation information can be successfully exploited by an MT system and that the indirect effects are genuine, accounting for around 16% of the total improvement. We also provide a study of the human effort involved in the resolution process